yannic kilcher

[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Yannic Kilcher on PhD's for ML #shorts

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)

Were RNNs All We Needed? (Paper Explained)

Yannic Kilcher on superintelligence #machineleaning

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained)

My GitHub (Trash code I wrote during PhD)

Grokking: Generalization beyond Overfitting on small algorithmic datasets (Paper Explained)

Attention Is All You Need

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools (Paper Explained)

Flow Matching for Generative Modeling (Paper Explained)

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

JEPA - A Path Towards Autonomous Machine Intelligence (Paper Explained)

What is Q-Learning (back to basics)

Safety Alignment Should be Made More Than Just a Few Tokens Deep (Paper Explained)

Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained)

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

No, Anthropic's Claude 3 is NOT sentient

xLSTM: Extended Long Short-Term Memory

GPT-4chan: This is the worst AI ever

Hopfield Networks is All You Need (Paper Explained)

OpenAI CLIP: ConnectingText and Images (Paper Explained)